EN FR
EN FR


Section: New Results

Models and abstractions for distributed systems

This section summarizes the major results obtained by the ASAP team that relate to the foundations of distributed systems.

The weakest failure detector to implement a register in asynchronous systems with hybrid communication

Participants : Damien Imbs, Michel Raynal.

This work introduces an asynchronous crash-prone hybrid system model. The system is hybrid in the way the processes can communicate. On the one side, a process can send messages to any other process. On another side, the processes are partitioned into clusters and each cluster has its own read/write shared memory. In addition to the model, a main contribution of the work concerns the implementation of an atomic register in this system model. More precisely, a new failure detector (denoted MΣ) is introduced and it is shown that, when considering the information on failures needed to implement a register, this failure detector is the weakest. To that end, the work presents an MΣ-based algorithm that builds a register in the considered hybrid system model and shows that it is possible to extract MΣ from any failure detector-based algorithm that implements a register in this model. The work also (a) shows that MΣ is strictly weaker than Σ (which is the weakest failure detector to implement a register in a classical message-passing system) and (b) presents a necessary and sufficient condition to implement MΣ in a hybrid communication system.

This work has been published in SSS 2011 [38] .

The universe of symmetry breaking tasks

Participants : Damien Imbs, Michel Raynal.

Processes in a concurrent system need to coordinate using a shared memory or a message-passing subsystem in order to solve agreement tasks such as, for example, consensus or set agreement. However, coordination is often needed to “break the symmetry” of processes that are initially in the same state, for example, to get exclusive access to a shared resource, to get distinct names or to elect a leader.

This work introduces and studies the family of generalized symmetry breaking (GSB) tasks, that includes election, renaming and many other symmetry breaking tasks. Differently from agreement tasks, a GSB task is “inputless”, in the sense that processes do not propose values; the task only specifies the symmetry breaking requirement, independently of the system's initial state (where processes differ only on their identifiers). Among various results characterizing the family of GSB tasks, it is shown that (non adaptive) perfect renaming is universal for all GSB tasks.

This work was done in collaboration with Sergio Rajsbaum from the Universidad Nacional Autonoma de Mexico and was published in SIROCCO 2011 [36] .

Read invisibility, virtual world consistency and probabilistic permissiveness are compatible

Participants : Tyler Crain, Damien Imbs, Michel Raynal.

The aim of a Software Transactional Memory (STM) is to discharge the programmers from the management of synchronization in multiprocess programs that access concurrent objects. To that end, an STM system provides the programmer with the concept of a transaction. The job of the programmer is to design each process the application is made up of as a sequence of transactions. A transaction is a piece of code that accesses concurrent objects, but contains no explicit synchronization statement. It is the job of the underlying STM system to provide the illusion that each transaction appears as being executed atomically. Of course, for efficiency, an STM system has to allow transactions to execute concurrently. Consequently, due to the underlying STM concurrency management, a transaction commits or aborts.

This work studies the relation between two STM properties (read invisibility and permissiveness) and two consistency conditions for STM systems, namely, opacity and virtual world consistency. Both conditions ensure that any transaction (be it a committed or an aborted transaction) reads values from a consistent global state, a noteworthy property if one wants to prevent abnormal behavior from concurrent transactions that behave correctly when executed alone. A read operation issued by a transaction is invisible if it does not entail shared memory modifications. This is an important property that favors efficiency and privacy. An STM system is permissive (respectively probabilistically permissive) with respect to a consistency condition if it accepts (respectively accepts with positive probability) every history that satisfies the condition. This is a crucial property as a permissive STM system never aborts a transaction “for free”. The work first shows that read invisibility, probabilistic permissiveness and opacity are incompatible, which means that there is no probabilistically permissive STM system that implements opacity while ensuring read invisibility. It then shows that read invisibility, probabilistic permissiveness and virtual world consistency are compatible. To that end the work describes a new STM protocol called IR_VWC_P. This protocol presents additional noteworthy features: it uses only base read/write objects and locks which are used only at commit time; it satisfies the disjoint access parallelism property; and, in favorable circumstances, the cost of a read operation is O(1).

This work has been published in ICA3PP 2011 [29] .

Towards a universal construction for transaction-based multiprocess programs

Participants : Tyler Crain, Damien Imbs, Michel Raynal.

The aim of a Software Transactional Memory (STM) system is to discharge the programmer from the explicit management of synchronization issues. The programmer's job resides in the design of multiprocess programs in which processes are made up of transactions, each transaction being an atomic execution unit that accesses concurrent objects. The important point is that the programmer has to focus her/his efforts only on the parts of code which have to be atomic execution units without worrying on the way the corresponding synchronization has to be realized.

Non-trivial STM systems allow transactions to execute concurrently and rely on the notion of commit/abort of a transaction in order to solve their conflicts on the objects they access simultaneously. In some cases, the management of aborted transactions is left to the programmer. In other cases, the underlying system scheduler is appropriately modified or an underlying contention manager is used in order that each transaction be (“practically always” or with high probability) eventually committed.

This work paper proposed a deterministic STM system in which (1) every invocation of a transaction is executed exactly once and (2) the notion of commit/abort of a transaction remains unknown to the programmer. This system, which imposes restriction neither on the design of processes nor or their concurrency pattern, can be seen as a step towards the design of a deterministic universal construction to execute transaction-based multiprocess programs on top of a multiprocessor. Interestingly, the proposed construction is lock-free (in the sense that it uses no lock).

This work has been published in ICDCN 2012 [30] .

A transaction friendly binary search tree

Participants : Tyler Crain, Michel Raynal.

Transactions, which provide optimistic synchronization by avoiding the use of blocking, greatly simplify multicore programming. In fact, the programmer has simply to encapsulate sequential operations or existing critical sections into transactions to obtain a safe concurrent program. Programmers have thus started evaluating transactional memory using data structures originally designed for pessimistic (i.e., non-optimistic) synchronization, whose prominent example is the red-black tree library developed by Oracle Labs that is part of STAMP and microbench distributions. Unfortunately, existing data structures are badly suited for optimistic synchronization as they rely on strong structural invariants, like logarithmic tree depth, to bound the step complexity of pessimistically synchronized accesses. By contrast, this complexity does not apply to optimistically synchronized accesses thus making the invariants overly conservative. More dramatically, guaranteeing such invariants tends to increase the probability of aborting and restarting the same access before it completes. We introduced a concurrent binary search tree that breaks transiently its balance structural invariants for efficiency, a property we call transaction-friendly. This new tree outperforms the existing transaction-based version of the AVL and the red-black trees. Its key novelty stems from the decoupling of update operations: they are split into one transaction that modifies the abstraction state and multiple ones that restructure its tree implementation. The resulting transaction-friendly library trades aborts for few additional access steps and, in particular, it speeds up a transaction-based travel reservation application by up to 3:5X. This work was done in collaboration with Vincent Gramoli from EPFL Lausanne, and is described in [52] .

Relations linking failure detectors associated with k-set agreement in message-passing systems

Participants : Achour Mostefaoui, Michel Raynal, Julien Stainer.

The k-set agreement problem is a coordination problem where each process is assumed to propose a value and each process that does not crash has to decide a value such that each decided value is a proposed value and at most k different values are decided. While it can always be solved in synchronous systems, k-set agreement has no solution in asynchronous send/receive message-passing systems where up to tk processes may crash.

A failure detector is a distributed oracle that provides processes with additional information related to failed processes and can consequently be used to enrich the computability power of asynchronous send/receive message-passing systems. Several failure detectors have been proposed to circumvent the impossibility of k-set agreement in pure asynchronous send/receive message-passing systems. Considering three of them (namely, the generalized quorum failure detector Σ k , the generalized loneliness failure detector k and the generalized eventual leader failure detector Ω k ) this work investigates their computability power and the relations that link them. There are three mains contributions: (a) it shows that the failure detector Ω k and the eventual version of k have the same computational power; (b) it shows that k is realistic if and only if kn/2; and (c) it gives an exact characterization of the difference between k (that is too strong for k-set agreement) and Σ k (that is too weak for k-set agreement). This work was published at SSS 2011 [45] .

The price of anonymity: optimal consensus despite asynchrony, crash and anonymity

Participant : Michel Raynal.

This work [23] , done in collaboration with François Bonnet, from JAIST, Japan, addresses the consensus problem in asynchronous systems prone to process crashes, where additionally the processes are anonymous (they cannot be distinguished one from the other: they have no name and execute the same code). To circumvent the three computational adversaries (asynchrony, failures and anonymity) each process is provided with a failure detector of a class denoted ψ, that gives it an upper bound on the number of processes that are currently alive (in a non-anonymous system, the classes ψ and 𝒫 -the class of perfect failure detectors- are equivalent).

The first part presents a simple ψ-based consensus algorithm where the processes decide in 2t+1 asynchronous rounds (where t is an upper bound on the number of faulty processes). It then shows one of its main results, namely, 2t+1 is a lower bound for consensus in the anonymous systems equipped with ψ. The second contribution addresses early-decision. The paper presents and proves correct an early-deciding algorithm where the processes decide in min(2f+2,2t+1) asynchronous rounds (where f is the actual number of process failures). This leads to think that anonymity doubles the cost (wrt synchronous systems) and it is conjectured that min(2f+2,2t+1) is the corresponding lower bound.

The work finally considers the k-set agreement problem in anonymous systems. It first shows that the previous ψ-based consensus algorithm solves the k-set agreement problem in R t =2t k+1 asynchronous rounds. Then, considering a family of failure detector classes {ψ } 0<k that generalizes the class ψ(=ψ 0 ), the paper presents an algorithm that solves the k-set agreement in R t, =2t k-+1 asynchronous rounds. This last formula relates the cost (R t, ), the coordination degree of the problem (k), the maximum number of failures (t) and the the strength () of the underlying failure detector.

On the road to the Weakest Failure Detector for k-Set Agreement in Message-passing Systems

Participant : Michel Raynal.

In the k-set agreement problem, each process (in a set of n processes) proposes a value and has to decide a proposed value in such a way that at most k different values are decided. While this problem can easily be solved in asynchronous systems prone to t process crashes when k>t, it cannot be solved when kt. Since several years, the failure detector-based approach has been investigated to circumvent this impossibility. While the weakest failure detector class to solve the k-set agreement problem in read/write shared-memory systems has recently been discovered (PODC 2009), the situation is different in message-passing systems where the weakest failure detector classes are known only for the extreme cases k=1 (consensus) and k=n-1 (set agreement).

This work [22] , done in collaboration with François Bonnet, from JAIST, Japan, has four contributions whose aim is to help pave the way to discover the weakest failure detector class for k-set agreement in message-passing systems. These contributions are the following. (a) The first is a new failure detector class, denoted Π k , that is such that Π 1 =Σ×Ω (the weakest class for k=1), and Π n-1 = (the weakest class for k=n-1). (b) The second is an investigation of the structure of Π k that shows that Π k is the combination of two failures detector classes Σ k (that is new) and Ω k (they generalize the previous “quorums” and “eventual leaders” failure detectors classes, respectively). (c) The third contribution concerns Σ k that is shown to be necessary requirement (as far as information on failure is concerned) to solve the k-set agreement problem in message-passing systems. (d) Finally, the last contribution is a Π n-1 -based algorithm that solves the (n-1)-set agreement problem. This algorithm provides us with a new algorithmic insight on the way the (n-1)-set agreement problem can be solved in asynchronous message-passing systems. It is hoped that these contributions will help discover the weakest failure detector class for k-set agreement in message-passing systems.

A non-topological proof for the impossibility of k-set agreement.

Participant : Armando Castañeda.

This work was done in collaboration with Hagit Attiya, from Technion, Haifa, Israel. In the k-set agreement task each process proposes a value, and each correct process has to decide a value which was proposed, so that at most k distinct values are decided. Using topological arguments it has been proved that k-set agreement is unsolvable in the asynchronous wait-free read/write shared memory model, when k<n, the number of processes.

This work [34] focuses on a simple, non-topological impossibility proof of k-set agreement. The proof depends on two simple properties of the immediate snapshot executions, a subset of all possible executions, and on the well known handshaking lemma stating that every graph has an even number of vertices with odd degree.

The paper was presented in the 13th Int'l Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS'11) in Grenoble, France. The journal version of the paper was submitted to Theoretical Computer Science.

Enriching the reduction map of sub-consensus tasks

Participants : Armando Castañeda, Damien Imbs, Michel Raynal.

This work [51] was done in collaboration with Sergio Rajsbaum from the Universidad Nacional Autonoma de Mexico.

Understanding the relative computability power of tasks, in the presence of asynchrony and failures, is a central concern of distributed computing theory. In the wait-free case, where the system consists of n processes and any of them can fail by crashing, substantial attention has been devoted to understanding the relative power of the subconsensus family of tasks, which are too weak to solve consensus for two processes. The first major results showed that set agreement and renaming (except for some particular values of n) cannot be solved wait-free in read/write memory. Then it was proved that renaming is strictly weaker than set agreement (when n is odd).

This work considers a natural family of subconsensus tasks that includes set agreement, renaming and other generalized symmetry breaking (GSB) tasks. It extends previous results, and proves various new results about when there is a reduction and when not, among these tasks. Among other results, the work shows that there are incomparable subconsensus tasks.

Byzantine Consensus Decidability

Participants : Achour Mostefaoui, Michel Raynal.

Solving the consensus problem requires in one way or another that the underlying system satisfies synchrony assumptions. Considering a system of n processes where up to t<n/3 may commit Byzantine failures, we proposed in [26] a necessary and sufficient synchrony assumption to solve consensus.

Such a condition is formulated with the notions of a symmetric synchrony property and property ambiguity. A symmetric synchrony property is a set of graphs, where each graph corresponds to a set of bi-directional eventually synchronous links among correct processes. Intuitively, a property is ambiguous if it contains a graph whose connected components are such that it is impossible to distinguish a connected component that contains correct processes only from a connected component that contains faulty processes only. The paper connects then the notion of a symmetric synchrony property with the notion of eventual bi-source, and shows that the existence of a virtual [t+1]bi-source is a necessary and sufficient condition to solve consensus in presence of up to t Byzantine processes in systems with bi-directional links and message authentication. Finding necessary and sufficient synchrony conditions when links are timely in one direction only, or when processes cannot sign messages, still remains open (and very challenging) problems.

Solving k-set agreement in message-passing systems

Participants : Achour Mostefaoui, Michel Raynal, Julien Stainer.

The k-set agreement problem is a coordination problem where each process is assumed to propose a value and each process that does not crash has to decide a value such that each decided value is a proposed value and at most k different values are decided. While it can always be solved in synchronous systems, k-set agreement has no solution in asynchronous send/receive message-passing systems where up to tk processes may crash.

A failure detector is a distributed oracle that provides processes with additional information related to failed processes and can consequently be used to enrich the computability power of asynchronous send/receive message-passing systems. Several failure detectors have been proposed to circumvent the impossibility of k-set agreement in pure asynchronous send/receive message-passing systems. Considering three of them (namely, the generalized quorum failure detector Σ k , the generalized loneliness failure detector k and the generalized eventual leader failure detector Ω k ), we investigated their computability power and the relations that link them in [45] . It has three mains contributions: (a) it shows that the failure detector Ω k and the eventual version of k have the same computational power; (b) it shows that k is realistic if and only if kn/2; and (c) it gives an exact characterization of the difference between k (that is too strong for k-set agreement) and Σ k (that is too weak for k-set agreement).

Efficient Implementations of Concurrent Objects

Participants : Achour Mostefaoui, Michel Raynal.

As introduced by Taubenfeld, a contention-sensitive implementation of a concurrent object is an implementation such that the overhead introduced by locking is eliminated in the common cases, i.e., when there is no contention or when the operations accessing concurrently the object are non-interfering. In [44] , we present a methodological construction of a contention-sensitive implementation of a concurrent stack. In a contention-free context a push or pop operation does not rest on a lock mechanism and needs only six accesses to the shared memory. In case of concurrency a single lock is required. Moreover, the implementation is starvation-free (any operation is eventually executed). The paper, that presents the algorithms in an incremental way, visits also a family of liveness conditions and important concurrency-related concepts such as the notion of an abortable object.